With advanced imaging, sequencing, and profiling technologies, multiple omics data become increasingly available and hold promises for many healthcare applications such as cancer diagnosis and treatment. Multimodal learning for integrative multi-omics analysis can help researchers and practitioners gain deep insights into human diseases and improve clinical decisions. However, several challenges are hindering the development in this area, including the availability of easily accessible open-source tools. This survey aims to provide an up-to-date overview of the data challenges, fusion approaches, datasets, and software tools from several new perspectives. We identify and investigate various omics data challenges that can help us understand the field better. We categorize fusion approaches comprehensively to cover existing methods in this area. We collect existing open-source tools to facilitate their broader utilization and development. We explore a broad range of omics data modalities and a list of accessible datasets. Finally, we summarize future directions that can potentially address existing gaps and answer the pressing need to advance multimodal learning for multi-omics data analysis.
translated by 谷歌翻译
预测药物目标相互作用是药物发现的关键。最近基于深度学习的方法显示出令人鼓舞的表现,但仍有两个挑战:(i)如何明确建模并学习药物与目标之间的局部互动,以更好地预测和解释; (ii)如何从不同分布的新型药物目标对上概括预测性能。在这项工作中,我们提出了Dugban,这是一个深层双线性注意网络(BAN)框架,并适应了域的适应性,以明确学习药物与目标之间的配对局部相互作用,并适应了分布数据外的数据。 Dugban在药物分子图和靶蛋白序列上进行预测的作品,有条件结构域对抗性学习,以使跨不同分布的学习相互作用表示,以更好地对新型药物目标对进行更好的概括。在内域和跨域设置下,在三个基准数据集上进行的实验表明,对于五个最先进的基准,Dugban取得了最佳的总体表现。此外,可视化学习的双线性注意图图提供了可解释的见解,从预测结果中提供了可解释的见解。
translated by 谷歌翻译
跨域推荐(CDR)可以帮助客户在不同域中找到更多令人满意的项目。现有的CDR模型主要使用普通用户或映射功能作为域之间的桥梁,但在充分利用跨域的额外知识方面的探索非常有限。在本文中,我们建议将CDR的知识图(kg)纳入,这使不同领域中的项目能够共享知识。为此,我们首先从Freebase KG构建了一个新的数据集AmazonKG4CDR和Amazon评论数据的一个子集(两个域对:电影音乐,电影书籍)。这个新的数据集有助于将知识与CDR内部和跨域项目桥接。然后,我们提出了一个新的框架,KG感知的神经集体矩阵分解(KG-NEUCMF),利用KG来丰富项目表示。它首先通过图形卷积自动编码器学习项目嵌入,以从kg中的相邻和高阶邻居中捕获域特异性和域一般知识。然后,我们最大程度地提高了从kg和用户项目矩阵中学到的项目嵌入之间的共同信息,以建立跨域关系以获得更好的CDR。最后,我们对新建的数据集进行了广泛的实验,并证明我们的模型明显优于表现最佳的基线。
translated by 谷歌翻译
Automatic anatomical landmark localization has made great strides by leveraging deep learning methods in recent years. The ability to quantify the uncertainty of these predictions is a vital component needed for these methods to be adopted in clinical settings, where it is imperative that erroneous predictions are caught and corrected. We propose Quantile Binning, a data-driven method to categorize predictions by uncertainty with estimated error bounds. Our framework can be applied to any continuous uncertainty measure, allowing straightforward identification of the best subset of predictions with accompanying estimated error bounds. We facilitate easy comparison between uncertainty measures by constructing two evaluation metrics derived from Quantile Binning. We compare and contrast three epistemic uncertainty measures (two baselines, and a proposed method combining aspects of the two), derived from two heatmap-based landmark localization model paradigms (U-Net and patch-based). We show results across three datasets, including a publicly available Cephalometric dataset. We illustrate how filtering out gross mispredictions caught in our Quantile Bins significantly improves the proportion of predictions under an acceptable error threshold. Finally, we demonstrate that Quantile Binning remains effective on landmarks with high aleatoric uncertainty caused by inherent landmark ambiguity, and offer recommendations on which uncertainty measure to use and how to use it. The code and data are available at https://github.com/schobs/qbin.
translated by 谷歌翻译
在视频动作识别中,变压器始终如一地达到最先进的准确性。但是,许多模型对于具有有限硬件资源的平均研究人员来说太大了。在这项工作中,我们探讨了轻量级动作识别的视频变压器的局限性。我们通过3个大规模数据集和10个硬件设备基准测试13个视频变压器和基线。我们的研究是第一个评估了在多个设备上深入了解动作识别模型的效率,并在相同的条件下培训各种视频变压器。我们将当前方法分类为三个类,并显示增强卷积骨架的复合变压器在轻量级动作识别中,尽管缺乏准确性。同时,仅关注模型需要更多的运动建模功能,独立的注意力块模型目前产生的延迟太多。我们的实验得出结论,目前的视频变压器尚未与传统卷积基线的轻量级动作识别,并且先前提到的缺点需要解决,以弥合这种差距。重现我们的实验的代码将公开可用。
translated by 谷歌翻译
有效的视频动作识别仍然是一个具有挑战性的问题。之后的一个大型模型取代了动力学数据集的最先进的地方,但往往缺乏现实世界的效率评估。在这项工作中,我们填补了这种差距并调查了变压器的使用以实现高效行动识别。我们提出了一种小说,轻量级的动作识别架构视频态度。以一种分解方式,我们用变压器仔细扩展了2D卷积时间段网络,同时在整个模型中保持空间和时间视频结构。现有方法经常诉诸两个极端之一,在那里他们要么将巨大的变压器应用于视频功能,或高度汇集视频功能上的最小变压器。我们的方法通过保持变压器模型小,但利用完整的时空特征结构来不同于它们。我们在时间苛刻的史诗 - 厨房-100和某物-V2(SSV2)数据集上的高效率设置中评估视频致力器,并发现它比现有最先进的效率和准确性更好地实现了更好的效率和准确性模型,除了SSV2上的时间换档模块。
translated by 谷歌翻译
We present in this paper a new architecture, named Convolutional vision Transformer (CvT), that improves Vision Transformer (ViT) in performance and efficiency by introducing convolutions into ViT to yield the best of both designs. This is accomplished through two primary modifications: a hierarchy of Transformers containing a new convolutional token embedding, and a convolutional Transformer block leveraging a convolutional projection. These changes introduce desirable properties of convolutional neural networks (CNNs) to the ViT architecture (i.e. shift, scale, and distortion invariance) while maintaining the merits of Transformers (i.e. dynamic attention, global context, and better generalization). We validate CvT by conducting extensive experiments, showing that this approach achieves state-of-the-art performance over other Vision Transformers and ResNets on ImageNet-1k, with fewer parameters and lower FLOPs. In addition, performance gains are maintained when pretrained on larger datasets (e.g. ImageNet-22k) and fine-tuned to downstream tasks. Pretrained on ImageNet-22k, our CvT-W24 obtains a top-1 accuracy of 87.7% on the ImageNet-1k val set. Finally, our results show that the positional encoding, a crucial component in existing Vision Transformers, can be safely removed in our model, simplifying the design for higher resolution vision tasks. Code will be released at https: //github.com/leoxiaobin/CvT.
translated by 谷歌翻译
An obstacle to artificial general intelligence is set by the continual learning of multiple tasks of different nature. Recently, various heuristic tricks, both from machine learning and from neuroscience angles, were proposed, but they lack a unified theory ground. Here, we focus on the continual learning in single-layered and multi-layered neural networks of binary weights. A variational Bayesian learning setting is thus proposed, where the neural network is trained in a field-space, rather than the gradient-ill-defined discrete-weight space, and furthermore, the weight uncertainty is naturally incorporated, and modulates the synaptic resources among tasks. From a physics perspective, we translate the variational continual learning into the Franz-Parisi thermodynamic potential framework, where the previous task knowledge acts as a prior and a reference as well. Therefore, the learning performance can be analytically studied with mean-field order parameters, whose predictions coincide with the numerical experiments using stochastic gradient descent methods. Our proposed principled frameworks also connect to elastic weight consolidation, and neuroscience inspired metaplasticity, providing a theory-grounded method for the real-world multi-task learning with deep networks.
translated by 谷歌翻译
具有复发性不对称耦合的神经网络对于了解如何在大脑中编码情节记忆很重要。在这里,我们将广泛的突触整合窗口的实验性观察整合到连续时间动力学中的序列检索模型中。理论上通过得出神经动力学中的雅可比矩阵的随机基质理论来研究具有非正态神经元相互作用的模型。这些光谱具有几个不同的特征,例如围绕原点的旋转对称性以及光谱边界内嵌套空隙的出现。因此,光谱密度高度不均匀地分布在复杂平面中。随机矩阵理论还可以预测过渡到混乱。特别是,混乱的边缘为记忆的顺序检索提供了计算益处。我们的工作提供了与任意时间延迟的时间隔离相关性的系统研究,因此可以激发对广泛记忆模型的未来研究,甚至可以激发生物学时间序列的大数据分析。
translated by 谷歌翻译
大规模的深神经网络会消耗昂贵的培训成本,但是培训导致构建网络的重量矩阵较小。在这里,我们提出了一种模式分解学习,可以将重量矩阵解释为潜在模式的层次结构。这些模式类似于记忆网络物理研究中的模式。模式分解学习不仅节省了大量的培训成本,而且还用领先模式解释了网络性能。模式学习方案显示了整个网络层次结构逐渐紧凑的潜在空间,而最少的模式仅随着网络宽度而对数增加。我们的模式分解学习还在分析的在线学习环境中进行了研究,该设置揭示了学习动力学的多阶段。因此,提出的模式分解学习指向了廉价且可解释的途径,朝着神奇的深度学习方向。
translated by 谷歌翻译